Boundedness of iterates in Q-Learning

نویسنده

  • Abhijit Gosavi
چکیده

Reinforcement Learning (RL) is a simulation-based counterpart of stochastic dynamic programming. In recent years, it has been used in solving complex Markov decision problems (MDPs). Watkins’ Q-Learning is by far the most popular RL algorithm used for solving discounted-reward MDPs. The boundedness of the iterates in Q-Learning plays a critical role in its convergence analysis and in making the algorithm stable, which makes it extremely attractive in numerical solutions. Previous results show boundedness asymptotically in an almost sure sense. We present a new result that shows boundedness in an absolute sense under some weaker conditions for the step size. Also, our proof is based on some simple induction arguments.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems

We consider a totally asynchronous stochastic approximation algorithm, Q-learning, for solving finite space stochastic shortest path (SSP) problems, which are total cost Markov decision processes with an absorbing and cost-free state. For the most commonly used SSP models, existing convergence proofs assume that the sequence of Q-learning iterates is bounded with probability one, or some other ...

متن کامل

Stochastic Shortest Path Games and Q-Learning

We consider a class of two-player zero-sum stochastic games with finite state and compact control spaces, which we call stochastic shortest path (SSP) games. They are total cost stochastic dynamic games that have a cost-free termination state. Based on their close connection to singleplayer SSP problems, we introduce model conditions that characterize a general subclass of these games that have...

متن کامل

Q-Learning Algorithms with Random Truncation Bounds and Applications to Effective Parallel Computing

Motivated by an important problem of load balancing in parallel computing, this paper examines a modified algorithm to enhance Q-learning methods, especially in asynchronous recursive procedures for self-adaptive load distribution at runtime. Unlike the existing projection method that utilizes a fixed region, our algorithm employs a sequence of growing truncation bounds to ensure the boundednes...

متن کامل

Stochastic approximation for non-expansive maps : application to Q-learning algorithms

We discuss synchronous and asynchronous iterations of the form x = x + γ(k)(h(x) + w), where h is a suitable map and {wk} is a deterministic or stochastic sequence satisfying suitable conditions. In particular, in the stochastic case, these are stochastic approximation iterations that can be analyzed using the ODE approach based either on Kushner and Clark’s lemma for the synchronous case or on...

متن کامل

Stochastic Approximation for Non-Expansive Maps:1 Application to Q-Learning Algorithms

We discuss synchronous and asynchronous variants of fixed point iterations of the form xk+1 = xk + γ(k) ( F (xk, ξk)− xk ) , where F is a non-expansive mapping under a suitable norm, and {ξk} is a stochastic sequence. These are stochastic approximation iterations that can be analyzed using the ODE approach based either on Kushner and Clark’s Lemma for the synchronous case or Borkar’s Theorem fo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Systems & Control Letters

دوره 55  شماره 

صفحات  -

تاریخ انتشار 2006